Improving Information Retrieval Systems using Part of Speech Tagging
نویسندگان
چکیده
The object of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In the paper we evaluate the use of Part of Speech Tagging to improve, the index storage overhead and general speed of the system with only a minimal reduction to precision recall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and 1989 document collection provided by TREC for parts of speech. We then experimented to find the most relevant part of speech to index. We show that 90% of precision recall is achieved with 40% of the document collections terms. We also show that this is a improvement in overhead with only a 1% reduction in precision recall.
منابع مشابه
Improving Arabic Information Retrieval Systems Using Part of Speech Tagging
The objective of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In this paper we evaluate the use of Part of Speech Tagging to improve the index storage overhead and general speed of the system with only a minimal incremen...
متن کاملImproving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging
With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about docum...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملGlean: Using Syntactic Information in Document Filtering
In this paper, we describe a system called Glean, which is based on the idea that coherent text contains signi cant latent information, such as syntactic structure and patterns of language use, which can be used to enhance the performance of information retrieval systems. We propose an approach to increase the precision of information retrieval that makes use of syntactic information obtained u...
متن کاملImproving Text Summarization Using Noun Retrieval Techniques
Text Summarization and categorization have always been two of the most demanding information retrieval tasks. Deploying a generalized, multifunctional mechanism that produces good results for both of the aforementioned tasks seems to be a panacea for most of the text-based, information retrieval needs. In this paper, we present the keyword extraction techniques, exploring the effects that part ...
متن کامل